Kohonen, T., Self-Organization and Associative Memory, Springer Verlag, Berlin, 1984
Fort, J.C., Solving a combinatorial Problem via Self-Organizing Process: An Application of the Kohonen Algorithm to the Traveling Salesman Problem, Biol. Cybern., 59, p.33-40, 1988.
The network consists of an array of ordered units or cells, each with m inputs. The array is viewed as a map. One- and two-dimensional maps are particularly useful but higher dimensional arrays can be considered as well. Each cell independently receives the same m-dimensional input vectors.
In the simplest version of the algorithm proposed by Kohonen, upon receiving each data sample, the position in the array of the maximally responding cell , i.e., the cell whose weight vector most closely resembles the data vector, will be identified. The learning process requires that winning cell and its immediate neighbors adjust the weights in such a way as to reduce the difference between their weight vector and the data vector. The amount of reduction applied to neighbors decreases with distance from the winning cell.
After repeated iterations, the locations of the peak responses will be
1) correctly ordered,
2) distributed (mapped) in a smooth fashion,
(the distribution will approximate the probability distribution of the data (to the extent that it was revealed by the samples drawn during the iterations)
3) contracted around the mean of the distribution.
The amount of contraction varies inversely with the variability of the input data.
After global convergence, the process will continue to show small fluctuations since each data sample will disturb the distribution slightly. The process may be terminated whenever a "satisfactory" level of spatial organization has been reached. What constitutes a satisfactory solution depends of course on a particular application. In practice, best results are obtained when the amount of correction applied is decreased progressively during the iterations while at the same time the extent of the neighborhood is narrowed.
This learning rule involves a global operation, namely, the localization of the maximally responding cell. This may be inappropriate in some applications and can be obviated by providing lateral connections between neighboring cells with positive (excitatory) weights (for the near neighbors) that become negative (inhibitory) as distance increases.
In the retrieval or execution phase, data is applied to all units in parallel. In the absence of lateral connections, the units respond individually and independently of one another.
"Retrieval" consists of observing the position of the cell responding maximally on the map constituted by the cell array. This position topologically identifies the input vector parameters.
In this notebook the cells have two input ports, but the corresponding data vector is constrained to have unit length. The angular position theta of this vector around the unit circle will be the only independent parameter. Thus the input domain is one-dimensional and can be visualized as points on the circle.
We now establish a one-dimensional array of cells by creating an (numData x 2) array of weightVectors for the two inputs.
The initial values for the weights will be chosen as random angles around the circle and will form vectors of unit length, directly comparable to the data vectors.
According to the learning rule, for each presentation of a datum vector, the adjustment can be pictured as a rotation of the winning cell and also, but to a lesser extent, of its neighbors, toward the datum vector.
The amount of "pull" must decrease with distance.
We will choose a pullProfile function of distance that describes the relative amount of weight modification imposed on the neighbors .
We will choose an even function with value one at distance zero and monotonically decreasing with the absolute value of distance.
A simple exponentially decaying function will do. The parameter spread will let us adjust the extent of interaction. We plot the function with spread=0.5.
The Unformatted text for this cell was not generated.
Use options in the Actions Preferences dialog box to
control when Unformatted text is generated.
;[o]
-Graphics-
:[font = text; inactive; dontPreserveAspect; ]
To get a discrete set of influence coefficients to apply to the cell array, we make a table of the pullProfile function, with an odd number of entries, centered on the zero distance (the winning cell). The parameter pullExtent sets the number of neighbors on each side of the winning cell that will receive an adjustment
To apply the adjustment in parallel to the whole array of cells, we need to insert these influence coefficients into an array of length numData. Most of the entries will be zero since only the immediate neighbors are affected.
Finally, to facilitate subsequent indexing, we rotate this array to place the maximum value (unity) in the first position. During the calculations, it will be shifted by the amount necessary to bring it into alignment with the "winning" cell, i.e., the cell displaying the maximum netInput.
We just need the number inside the list (Flatten will do) and we should allow for the unlikely event of more than one maximum, in which case we will just take the first to show up in the array.
We are now in position to adjust the weights in parallel in the entire weightVectors array. Specifically, we will rotate the weightAngles toward the datum angle theta, in proportions set by the pullFactors array and multiplied by a gain parameter, gain, that controls the size of the successive adjustments. At this time gain is set to a constant value.
We can see that the vector representing the maximally responding cell has moved closer to the datum. We can also look at the current state of the entire array by showing the irregular polygon formed by the weight vectors of successive cells in the array
We will now reset the problem parameters and iterate the process to see how the cell weights ( represented as points around the circle) self-organize when input data from a uniform distribution is applied repeatedly and weights are corrected according to the rules just described.
The output graphs can be animated to better illustrate the transformation.
We will now iterate the network. To avoid an excessive number of graphics outputs, we will look at the configuration periodically, after iterInterval steps, and continue the process for numSnap successive such iteration intervals (each graph will be separated by iterInterval iterations).
We need to introduce two additional heuristics. At the beginning the network is very disorganized and large corrections are desirable. These should be progressively reduced as spatial organization settles in. This can be accomplished by using a decay factor for the adjustment gain gain. To further improve convergence, it is also desirable to start with a broad spatial spread of adjustment (determined by the value of pullExtent). This will require recalculating the correction table periodically and will be omitted at this time.